End-to-end support for coarser-than-daily count signals by cdc-mitzimorris · Pull Request #794 · CDCgov/PyRenew

cdc-mitzimorris · 2026-04-21T23:47:46Z

Overview

Adds support for count observations aggregated to a weekly grid while the renewal equation continues to be evaluated daily. Two design pieces:

Weekly observation likelihood path. Daily predicted counts are summed to the weekly observation grid before the likelihood is scored. Aggregation lives inside the numpyro-traced graph so the noise model and the likelihood operate on the same scale.
Independent parameter cadence. The temporal process for $\mathcal{R}(t)$ may be parameterized daily or weekly (or any stepwise cadence) and is broadcast to the daily axis. The renewal equation and all observation transforms run daily either way.

End-to-end flow for a weekly signal

$\mathcal{R}(t)$ sampled by a temporal process at the chosen cadence and broadcast to the daily model axis if needed.
Daily renewal equation consumes daily $\mathcal{R}(t)$ and produces the daily infection trajectory.
Daily predicted counts computed via ascertainment + delay convolution.
Daily predicted counts summed to the weekly observation grid via pyrenew.time.daily_to_weekly.
Likelihood scored at weekly scale against weekly observations.

Relationship to pyrenew-hew

The production pyrenew-hew model parameterizes $\mathcal{R}(t)$ weekly and aggregates daily predicted hospital admissions to a weekly grid. This PR brings the same capability into PyRenew while making parameter cadence a user choice rather than a fixed coupling:

Match pyrenew-hew today: wrap the inner temporal process in StepwiseTemporalProcess(step_size=7, alignment="calendar_week", week_start_dow=...).
Daily parameterization for the same signals: drop the wrapper and use the inner process directly.
Mixed cadences: weekly hospital admissions and daily ED visits in one model regardless of the Rt parameter cadence.

Observation cadence and parameter cadence are independent design choices; the builder no longer enforces a pairing rule between them.

Reviewer guide

Review bottom-up through the dependency chain. Each unit's changes are self-contained.

1. Synthetic data refresh (120 → 126 days)

Files: datagen_he_CA_126.py, synthetic_CA_126/*.csv, synthetic_data.py, test_datagen_he_CA_126.py, test_datasets_synthetic.py.
Six extra days yields 18 full weeks. No API change.
Verify true_parameters.json matches the generating process.

2. Observation base validators — pyrenew/observation/base.py

New: _validate_aggregation_params, _compute_period_offset, _validate_period_end_times.
Generalized _validate_shapes_match replaces _validate_obs_times_shape.
_validate_dow_effect now uses the shared require_shape helper.
Focus: offset arithmetic (period_end_dow + 1 - first_day_dow) % 7 and period-boundary alignment.

3. Latent: temporal processes — pyrenew/latent/temporal_processes.py

New class StepwiseTemporalProcess; step_size attr added to the TemporalProcess Protocol and to AR1 / DifferencedAR1 / RandomWalk.
Alignment options: "model_index" (default) starts blocks at model index 0; "calendar_week" aligns weekly blocks to a declared week_start_dow.
Calendar-week broadcast delegates to pyrenew.time.weekly_to_daily; coarse trajectory recorded as {name_prefix}_coarse.
first_day_dow threaded through the Protocol so calendar-aligned wrappers can use the model-axis day-of-week; standard processes ignore it.

4. Latent: shape contracts — pyrenew/latent/{population,subpopulation}_infections.py

Both sample() methods accept first_day_dow and forward it to their temporal processes.
Output shape of every temporal process is validated via the shared pyrenew.arrayutils.require_shape helper.

5. Count observation path — pyrenew/observation/count_observations.py

New ctor params aggregation_period, reporting_schedule, period_end_dow.
New _aggregate wraps pyrenew.time.daily_to_weekly.
Branching in validate_data / sample for regular vs. irregular schedules.
Confirm all in-tree SubpopulationCounts call sites updated; times → period_end_times rename consistent.

6. Measurement observations — pyrenew/observation/measurement_observations.py

One-line refactor to use the renamed shape validator.

7. Builder + model coherence — pyrenew/model/{pyrenew_builder,multisignal_model}.py

MultiSignalModel.sample() accepts first_day_dow and forwards it to the latent process.
_validate_coherence enforces calendar-anchor and structural coherence:
- All observations sharing an aggregation_period > 1 must agree on period_end_dow.
- Every temporal-process step_size must be a positive integer.
- Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.
first_day_dow required at validate_data when any obs has aggregation_period > 1.
The previous "step_size ≤ finest aggregation_period" rule was deliberately removed — see the in-body comment.

8. Integration test — test/integration/test_population_infections_he_weekly.py

End-to-end MCMC: weekly admissions + daily ED visits, calendar-week-aligned weekly Rt.
Prior-predictive shape check and posterior recovery for I0 + baseline R(t).
Uses numpyro.enable_x64() + set_host_device_count(4).

9. Test fixtures + config — test/conftest.py, pyproject.toml, _typos.toml

196-line conftest promotes repeated setup to fixtures, including reusable temporal-process stubs (WrongShapeTemporalProcess, ConstantTemporalProcess, InvalidStepSizeTemporalProcess).
Pytest integration marker added so -m "not integration" skips MCMC tests.

10. Tutorial — docs/tutorials/building_multisignal_models.qmd

New section explaining the three independent choices: parameter cadence, model time axis, observation cadence.
Worked example pairing weekly Rt (alignment="calendar_week", week_start_dow=6) with weekly observations (period_end_dow=5).

Where to focus review attention

Offset math in _compute_period_offset and the period-boundary check in _validate_period_end_times — these govern correctness of weekly alignment.
StepwiseTemporalProcess calendar-week broadcasting — weekly_to_daily is reused; sanity check with n_timepoints=17, first_day_dow=3, week_start_dow=6 produces 3 coarse samples broadcast to [c0×3, c1×7, c2×7].
The three coherence rules in _validate_coherence — each has pass + distinct-failure tests in test_pyrenew_builder.py.
CountObservation._aggregate runs inside the numpyro-traced graph — likelihood scoring is at weekly scale, not post-hoc.

Incidental fixes (not directly tied to #789)

Found while implementing aggregation; small enough that separating them would create churn.

_validate_index_array empty-array guard (observation/base.py) — jnp.any(indices < 0) on empty arrays returned False; now returns early on size == 0.
_validate_index_array / _validate_obs_dense ndim check — previously accepted non-1D arrays and relied on silent broadcasting; now rejects with a clear error.
AR(p) stationary-SD bound in test_ar_process_asymptotics — the bound |long_ts[-1]| < 3 * noise_sd was too tight (stationary SD strictly > innovation SD when autoregressive coefficients are non-zero) and latently flaky; replaced with closed-form stationary SD per order.
_validate_obs_times_shape → _validate_shapes_match rename — prerequisite refactor; same shape-match logic needed for (obs, period_end_times) pairs.
_typos.toml: allow dows. pyproject.toml: register integration pytest marker.

Copilot

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cdc-mitzimorris · 2026-04-23T21:39:46Z

@dylanhmorris and @damonbayer - this is ready for review.
there's a lot of changes to look at via the github interface - here's an analysis of how much of this is code changes

PR Analysis: `mem_789_temporal_aggregation` vs `main`

Excluding pyrenew/datasets/. "other" = blank lines and # comments.

Top-level totals

area	doc+	doc-	code+	code-	other+	other-
pyrenew/ (sans datasets)	644	90	487	67	113	13
test/	713	22	1,943	94	473	2

pyrenew/ per file (sorted by code churn)

file	doc+	doc-	code+	code-	other+	other-
`observation/count_observations.py`	283	51	218	52	35	13
`latent/temporal_processes.py`	114	0	85	1	23	0
`observation/base.py`	89	12	76	10	18	0
`model/multisignal_model.py`	46	15	39	3	7	0
`model/pyrenew_builder.py`	20	5	37	0	7	0
`time.py`	54	3	11	0	19	0
`latent/subpopulation_infections.py`	10	2	10	0	0	0
`latent/population_infections.py`	9	2	4	0	0	0
`arrayutils.py`	15	0	3	0	4	0
`observation/measurement_observations.py`	4	0	2	1	0	0
`latent/__init__.py`	0	0	2	0	0	0

test/ per file (sorted by code churn)

file	doc+	doc-	code+	code-	other+	other-
`test_observation_counts.py`	63	0	482	8	87	0
`test_pyrenew_builder.py`	67	1	312	10	45	0
`test_observation_validation.py`	53	9	221	30	54	2
`integration/test_population_infections_he_weekly.py`	165	0	223	0	70	0
`integration/test_population_infections_he_weekly_rt.py`	128	0	193	0	62	0
`test_temporal_processes.py`	22	0	153	1	33	0
`conftest.py`	142	0	100	0	71	0
`integration/conftest.py`	47	0	93	3	21	0
`test_datagen_he_CA_126.py`	8	6	32	24	5	0
`test_population_infections.py`	2	0	37	1	6	0
`test_subpopulation_infections.py`	2	0	38	0	4	0
`test_time.py`	8	0	32	0	9	0
`integration/test_population_infections_he.py`	1	1	7	9	1	0
`test_ar_process.py`	0	0	13	1	5	0
`test_datasets_synthetic.py`	5	5	7	7	0	0

Notes on the split

The code-vs-docstring split uses the Python AST of each file version to identify Module/ClassDef/FunctionDef leading string-literal bodies; every +/- line is classified by its line number in the respective revision. Triple-quoted strings that aren't docstrings (e.g., multiline literals inside function bodies) count as code.
Code churn is dominated by count_observations.py (the irregular/weekly aggregation landing for Handle temporal aggregation in observational data #789) and temporal_processes.py. observation/base.py and multisignal_model.py carry the dispatch plumbing for first_day_dow / obs_start_date. time.py and arrayutils.py are mostly docstring (new module-level prose around small helpers).

dylanhmorris · 2026-04-24T14:38:51Z

+        between R(t) parametrization cadence and observation aggregation.
    """

+    step_size: int


Given docs above about default. Consider setting to 1 in the base class and then overriding in inherited classes when appropriate? Open to the more explicit approach as well though.

@damonbayer thoughts?

agreed we can do this either way, but there are a few reasons to keep it as is:

TemporalProcess is a typing.Protocol, which is meant to describe a structural shape, not to provide default values.

Yes, the current code has 3 lines of boilerplate but they're in the right place - reading AR1 on its own tells you its step_size is 1. design precept: explicit > implicit (PEP 20).

If someone wants to add another TemporalProcess, seeing that the existing subclasses declare step_size explicitly is a prompt to consider what behavoir applies; otherwise, they might overlook an inappropriate default.

dylanhmorris · 2026-04-24T14:39:21Z

        more volatile trajectories; smaller values produce smoother ones.
    """

+    step_size: int = 1


See above about potential to avoid with inheritance.

dylanhmorris · 2026-04-24T20:20:49Z

All observations sharing an aggregation_period > 1 must agree on period_end_dow.
Every temporal-process step_size must be a positive integer.
Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.

High-level comment: do we definitely need to enforce all weekly quantities sharing the same week?

I agree that in many cases a user will want this, but it's not a given. For example, you could imagine two weekly aggregate observables, one reported in MMWR epiweeks, the other in isoweeks. Similarly, while matching temporal process weeks to observation weeks makes sense to me as a default, I don't think we should enforce it.
https://github.com/CDCgov/PyRenew/pull/794/changes#r3140086415

damonbayer · 2026-04-27T18:08:15Z

+    inner
+        Inner ``TemporalProcess`` that generates the coarse-scale
+        trajectory. Must satisfy the ``TemporalProcess`` protocol.
+    step_size


A few comments here.

The interdependencies here feel bad.

alignment = "calendar_week" => step_size = 7 and week != None
alignment != "calendar_week" => week = None

Is there a better way to structure this?

Can / should we use "week" instead of "calendar week"?

Do we really need non-None week for alignment = "calendar_week", or could we use the data to inform what the week should be?

damonbayer · 2026-04-27T18:39:01Z

+        for value in vars(self).values():
+            if (
+                isinstance(value, TemporalProcess)
+                and getattr(value, "alignment", None) == "calendar_week"
+            ):
+                return True
+        return False


Cleaner to use a generator expression?

Suggested change

for value in vars(self).values():

if (

isinstance(value, TemporalProcess)

and getattr(value, "alignment", None) == "calendar_week"

):

return True

return False

return any(

isinstance(value, TemporalProcess)

and getattr(value, "alignment", None) == "calendar_week"

for value in vars(self).values()

)

damonbayer · 2026-04-27T18:52:34Z

        subpop_fractions
            Population fractions. Defaults to ``[1.0]`` (single population).
            Must be ``[1.0]`` if provided.
+        first_day_dow


When does the first_day occur in terms of model time? Can we be more clear about that?

damonbayer · 2026-04-27T19:01:39Z

+            f"R(t) 90% CI coverage was {coverage:.1%}, expected >= 80%"
+        )
+
+    def test_ascertainment_rates_recover_order_of_magnitude(


How did you decide which parameters to assess and why do you use different assessments? (95% coverage vs within a factor of 5)

cdc-mitzimorris added 30 commits September 15, 2025 18:24

Merge branch 'main' of https://github.com/CDCgov/PyRenew

680bb1e

update

2cb876b

Merge branch 'main' of github-bf06:CDCgov/PyRenew

60db8df

Merge branch 'main' of github-bf06:CDCgov/PyRenew

32a5314

Merge branch 'main' of github-bf06:CDCgov/PyRenew

d6213f2

Merge branch 'main' of github-bf06:CDCgov/PyRenew

96f27c9

Merge branch 'main' of github-bf06:CDCgov/PyRenew

1cb6fa2

Merge branch 'main' of github-bf06:CDCgov/PyRenew

f62e1e4

Merge branch 'main' of github-bf06:CDCgov/PyRenew

0c6785d

Merge branch 'main' of github-bf06:CDCgov/PyRenew

1ee62b9

Merge branch 'main' of github-bf06:CDCgov/PyRenew

0629461

Merge branch 'main' of github-bf06:CDCgov/PyRenew

efeadee

Merge branch 'main' of github-bf06:CDCgov/PyRenew

371ba98

Merge branch 'main' of github-bf06:CDCgov/PyRenew

0304bed

Merge branch 'main' of github-bf06:CDCgov/PyRenew

ffeea65

Merge branch 'main' of github-bf06:CDCgov/PyRenew

50e7261

Merge branch 'main' of github-bf06:CDCgov/PyRenew

dae6af8

Merge branch 'main' of github-bf06:CDCgov/PyRenew

5cb3097

Merge branch 'main' of github-bf06:CDCgov/PyRenew

1d80ccc

Merge branch 'main' of github-bf06:CDCgov/PyRenew

e73b401

Merge branch 'main' of github-bf06:CDCgov/PyRenew

b1473b5

Merge branch 'main' of github-bf06:CDCgov/PyRenew

0b929b5

Merge branch 'main' of github-bf06:CDCgov/PyRenew

3ee00a7

Merge branch 'main' of github-bf06:CDCgov/PyRenew

307982a

Merge branch 'main' of github-bf06:CDCgov/PyRenew

b862bc6

Merge branch 'main' of github-bf06:CDCgov/PyRenew

2c665a5

Merge branch 'main' of github-bf06:CDCgov/PyRenew

60d6458

Merge branch 'main' of github-bf06:CDCgov/PyRenew

ec8c464

Merge branch 'main' of github-bf06:CDCgov/PyRenew

c018bf7

Merge branch 'main' of github-bf06:CDCgov/PyRenew

d0207dd

fix tutorial

529c494

cdc-mitzimorris requested a review from Copilot April 23, 2026 20:45

Copilot started reviewing on behalf of cdc-mitzimorris April 23, 2026 20:45 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread pyrenew/observation/count_observations.py Outdated

changes per copilot review

7a67e8f

fix logic on obs_start_date for calendar alignment

5527261